Monday - Filter deer data
- 1-2 deers
- Summer and Winter of one year
- Differing sample rate
- 3 hour for weekend vs weekday comparison
- 5 minute for daily rhythms
- for avoidance of routes/crossing use the 5 minute limit
In addition to being the main habitat of European roe deer (Capreolus capreolus), forests provide diverse and outstanding settings for recreational activities such as hiking, horse riding and mountain biking (Jacsman, 1990)
The following hypotheses are therefore being tested:
| Null hypothesis | Alternate hypothesis |
|---|---|
| Deer movement does not vary between weekdays and weekend | Deer movement on average increases during the weekend |
| Dear movement is not constrained by recreational activity | Deer movement is constrained by activity |
deer <- read_csv("data/all_deer_cleaned.csv")
## Rows: 81025 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): reh
## dbl (4): latitude, longitude, y, x
## dttm (1): datetime_utc
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
deer %>%
group_by(reh) %>%
summarise(n = n())
## # A tibble: 15 × 2
## reh n
## <chr> <int>
## 1 RE01 1951
## 2 RE02 6019
## 3 RE03 5403
## 4 RE04 1895
## 5 RE05 6714
## 6 RE06 6187
## 7 RE07 3725
## 8 RE08 6548
## 9 RE09 6639
## 10 RE10 6924
## 11 RE11 5599
## 12 RE12 7289
## 13 RE13 7800
## 14 RE14 2465
## 15 RE15 5867
summary(deer)
## latitude longitude y x
## Min. :47.17 Min. :8.496 Min. :680024 Min. :225196
## 1st Qu.:47.23 1st Qu.:8.521 1st Qu.:681910 1st Qu.:231974
## Median :47.28 Median :8.555 Median :684468 Median :237349
## Mean :47.27 Mean :8.548 Mean :683921 Mean :235882
## 3rd Qu.:47.29 3rd Qu.:8.569 3rd Qu.:685534 3rd Qu.:238217
## Max. :47.30 Max. :8.597 Max. :687703 Max. :239875
## reh datetime_utc
## Length:81025 Min. :2013-09-12 23:01:02
## Class :character 1st Qu.:2014-05-01 17:50:11
## Mode :character Median :2014-10-22 17:01:18
## Mean :2014-10-21 10:35:57
## 3rd Qu.:2015-03-26 11:00:44
## Max. :2016-07-04 23:20:21
str(deer)
## spec_tbl_df [81,025 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ latitude : num [1:81025] 47.3 47.3 47.3 47.3 47.3 ...
## $ longitude : num [1:81025] 8.57 8.57 8.57 8.57 8.57 ...
## $ y : num [1:81025] 685545 685552 685534 685557 685634 ...
## $ x : num [1:81025] 235920 235919 235917 235933 235946 ...
## $ reh : chr [1:81025] "RE01" "RE01" "RE01" "RE01" ...
## $ datetime_utc: POSIXct[1:81025], format: "2013-09-12 23:01:30" "2013-09-13 02:01:00" ...
## - attr(*, "spec")=
## .. cols(
## .. latitude = col_double(),
## .. longitude = col_double(),
## .. y = col_double(),
## .. x = col_double(),
## .. reh = col_character(),
## .. datetime_utc = col_datetime(format = "")
## .. )
## - attr(*, "problems")=<externalptr>
The deer dataset was provided from a multi-year (2014-2016) research project conducted in partnership with the Wildnispark Sihlwald. The Wildnispark Zurich consists of a protected core area and a surrounding area with high recreational use due to its proximity to the city of Zurich. The dataset tracks the movement of 14 deer using GPS trackers. Each fix point in the dataset contains the following attributes:
| Name | Units | Purpose |
|---|---|---|
| Latitude | Degrees | Positioning data |
| Longitude | Degrees | Positioning data |
| x | Positioning data (projected) | |
| y | Positioning data (projected) | |
| reh | ID of individual | |
| datetime | Time |
The characteristics of the data limit the computational movement analysis methods that can be applied.
| Col1 | Col2 | Col3 | Col4 |
|---|---|---|---|
Activity data from Strava was gathered by analysing the publicly available global heatmap using a script provided by Nils Ratnaweera. Colour values from the raster data are used to indicate the number of activities was converted into . This provides a relative indication of how often a route is frequented. Whilst all activity types (cycling, running etc.) were gathered, the dataset does not allow for differentiation between activities and contains no temporal data.
Street and hiking path data was sourced from swisstopo.
## Relevant functions
rle_id <- function(vec){
x <- rle(vec)$lengths
as.factor(rep(seq_along(x), times=x))
}
## lets assign some useful, basic movement parameters
speed_calculator <- function(df_input){
df_input %>%
group_by(reh) %>%
mutate(timelag = as.integer(difftime(lead(datetime_utc), datetime_utc, units = "secs")),
steplength = sqrt((y - lead(y,1))^2 + (x - lead(x,1))^2),
timelag_minutes = timelag/60,
speed_ms = steplength/timelag,
speed_kph = speed_ms * 3.6) -> df_output
return(df_output)
}
## Static points¨
step_calculator <- function(df_input){
mutate(
nMinus3 = sqrt((lag(y, 3)-y)^2 + (lag(x, 3)-x)^2),
nMinus2 = sqrt((lag(y, 2)-y)^2+(lag(x, 2)-x)^2),
nMinus1 = sqrt((lag(y, 1)-y)^2+(lag(x, 1)-x)^2),
nPlus1 = sqrt((y-lead(y, 1))^2+(x-lead(x, 1))^2),
nPlus2 = sqrt((y-lead(y, 2))^2+(x-lead(x, 2))^2),
nPlus3 = sqrt((y-lead(y, 3))^2+(x-lead(x, 3))^2)) %>%
rowwise(.) %>%
mutate(
stepMean = mean(c(nMinus3, nMinus2, nMinus1, nPlus1, nPlus2, nPlus3))) %>%
ungroup() %>%
mutate(static = stepMean < mean(stepMean, na.rm = TRUE)) -> df_output
return(df_output)
}
## Reading geometry, setting LV95 as coordinates
## Getting weekdays, timelag, stepMean and Speed data
deer %>%
st_as_sf(., coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(., crs = 2056) %>%
mutate(x = unlist(map(.$geometry,1)),
y = unlist(map(.$geometry,2))) %>%
mutate(wdays = lubridate::wday(datetime_utc, label = TRUE),
week_start = getOption("lubridate.week.start", -1),
month = factor(format(datetime_utc,"%B"), levels = month.name),
year = format(datetime_utc, "%Y")) %>%
speed_calculator(.)-> deer
### Need to load the street/hiking data
# First create a polygon with bounding boc
x_coord <- c( 2679000, 2687981, 2687981, 2679000)
y_coord <- c(1241000, 1241000, 1228484, 1228484)
xym <- cbind(x_coord, y_coord)
library(sp)
p = Polygon(xym)
ps = Polygons(list(p),1)
sps = SpatialPolygons(list(ps))
strassen <- read_sf("/Users/nikitakrahenbuhl/Documents/Masters/PandT/project/datachallenge-n/data/swiss_TLM_data.gpkg","strassen")
boden <- read_sf("/Users/nikitakrahenbuhl/Documents/Masters/PandT/project/datachallenge-n/data/swiss_TLM_data.gpkg","bodenbedeckung")
strassen %>% filter(grepl("Weg|Autobahn|3m Strasse", OBJEKTART)) -> strassen_freizeit
strassen %>% filter(!is.na(WANDERWEGE)) -> wanderwege
source("strava.R")
#strava_get(zoom = 11)
library(terra)
## terra 1.5.21
##
## Attaching package: 'terra'
## The following object is masked from 'package:MASS':
##
## area
## The following object is masked from 'package:ggpubr':
##
## rotate
## The following object is masked from 'package:dplyr':
##
## src
## The following object is masked from 'package:tidyr':
##
## extract
## The following object is masked from 'package:ggplot2':
##
## arrow
strava <- rast("strava_all_11_hot/combined/05_cropped.tif") %>% crop(., sps)
plot(strava)
## Temporal overview
ggplot(data = deer, aes(x = datetime_utc, y = reh)) +
geom_line()
## Sampling schedule
ggplot(deer) + geom_boxplot(aes(x=reh, y=timelag_minutes))
## Warning: Removed 15 rows containing non-finite values (stat_boxplot).
ggplot(deer) + geom_histogram(aes(x=timelag_minutes), bins = 200)
## Warning: Removed 15 rows containing non-finite values (stat_bin).
ggplot(deer) + geom_boxplot(aes(x=reh, y=steplength))
## Warning: Removed 15 rows containing non-finite values (stat_boxplot).
ggplot(deer) + geom_histogram(aes(x=steplength), bins = 100)
## Warning: Removed 15 rows containing non-finite values (stat_bin).
ggplot(deer, aes(x=datetime_utc, y = timelag_minutes, color = reh)) + geom_line() + facet_wrap(~reh)
## Warning: Removed 15 row(s) containing missing values (geom_path).
## Initial spatial exploration
deer %>%
group_by(reh) %>%
summarise(.) %>%
st_convex_hull(.) -> mcp
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(mcp) + tm_polygons("reh", alpha = 0.4) +
tm_shape(wanderwege) + tm_lines(col = "tomato3")
Multiple sampling regimes, in intervals of 180 and 5 minutes were used. This data was consequently split, with long sampling intervals used to examine variations within a week. The shorter sampling intervals were used to analyse daily variations in movement and interactions with recreational infrastructure. There are clearly some temporal outliers, which were removed by sampling +- 2 minutes of the identified sampling intervals. Spatial outliers were roughly removed by filtering the data to a steplength of less than 2500 meters.
Due to the large amount of deer groups and temporal timescale a subset of deer groups must be chosen that interact with recreational infrastructure, are independent and contain concurrently recorded GPS tracks.
was chosen for analysis, as they are fairly dispersed and interact with recreational infrastructure. The multi-annual sampling period was also reduced to 2014, as there was continuous and well-sampled data for the chosen deer groups.
## Are some deer on average more active (disturbed) than others
deer %>%
st_drop_geometry() %>%
filter(between(timelag_minutes, 178, 182) & steplength < 2500 & year == 2014) %>%
group_by(reh) %>% summarise(avg_steplength_long = mean(steplength)) -> df
deer %>%
st_drop_geometry() %>%
filter(between(timelag_minutes, 3, 7) & steplength < 2500 & year == 2014) %>%
group_by(reh) %>% summarise(avg_steplength_short = mean(steplength)) -> df1
merge(df, df1) -> df2
active_deer <- c("RE15", "RE08", "RE11", "RE06")
#active_deer <- c("RE06")
deer %>%
filter(between(timelag_minutes, 178, 182) &
steplength < 2500 &
reh %in% active_deer &
year == 2014) -> deer_long
deer_long %>%
ggplot(., aes(x=x, y=y, color=reh)) + geom_path() + geom_point() +
coord_fixed()
deer_long %>%
st_drop_geometry(.) %>%
mutate(month = format(datetime_utc,"%m"), year = format(datetime_utc, "%Y")) %>%
group_by(reh, year, month, wdays) %>%
summarise(mean_steplength = mean(steplength), mean_speed = mean(speed_ms), n=n()) %>%
ggplot() + geom_tile(aes(x=wdays, y = month, fill = n)) + facet_grid(reh~year)
## `summarise()` has grouped output by 'reh', 'year', 'month'. You can override
## using the `.groups` argument.
deer %>%
mutate(year = format(datetime_utc, "%Y")) %>%
filter(between(timelag_minutes, 3, 7) &
steplength < 2500 &
reh %in% active_deer &
year == 2014) -> deer_short
deer_short %>%
ggplot(., aes(x=x, y=y, color=reh)) + geom_path() + geom_point() +
coord_fixed()
deer_short %>%
st_drop_geometry(.) %>%
mutate(month = format(datetime_utc,"%m"), year = format(datetime_utc, "%Y")) %>%
group_by(reh, year, month, wdays) %>%
summarise(mean_steplength = mean(steplength), mean_speed = mean(speed_ms), n=n()) %>%
ggplot() + geom_tile(aes(x=wdays, y = month, fill = n)) + facet_grid(reh~year)
## `summarise()` has grouped output by 'reh', 'year', 'month'. You can override
## using the `.groups` argument.
A subset of data was defined.
GPS Tracks for deer was
How did I handle outliers?
Now lets have a look at visual variant
Would be really interesting to calculate steplength or speed within a day and then split it up by day of the week
deer_long %>%
st_drop_geometry() %>%
group_by(reh, month, wdays) %>%
summarise(avg_steplength = mean(steplength, na.rm = TRUE)) %>%
ggplot(., aes(x=wdays, y = avg_steplength)) + geom_bar(stat="identity", position = position_dodge(), width=.5)
## `summarise()` has grouped output by 'reh', 'month'. You can override using the
## `.groups` argument.
deer_long %>%
st_drop_geometry() %>%
group_by(reh, month, wdays) %>%
summarise(avg_steplength = mean(steplength, na.rm = TRUE)) %>%
ggplot(., aes(x=wdays, y = avg_steplength)) + geom_bar(stat="identity", position = position_dodge(), width=.5) + facet_wrap(~month)
## `summarise()` has grouped output by 'reh', 'month'. You can override using the
## `.groups` argument.
deer_long %>%
st_drop_geometry() %>%
mutate(hour = hour(round_date(datetime_utc, "hour"))) %>%
group_by(reh, month, wdays, hour) %>%
summarise(avg_steplength = mean(steplength, na.rm = TRUE)) %>%
ggplot(., aes(x = hour, y = avg_steplength, color = reh)) + geom_line() + geom_point() + facet_grid(month~wdays)
## `summarise()` has grouped output by 'reh', 'month', 'wdays'. You can override
## using the `.groups` argument.
deer %>%
mutate(hour = hour(round_date(datetime_utc, "hour"))) %>%
group_by(reh, wdays, hour) %>%
summarise(mean_steplength = mean(steplength), mean_speed = mean(speed_kph)) %>%
ggplot(data = ., aes(x=hour, y=mean_steplength, color = reh)) + geom_line() + facet_wrap(~wdays)
## `summarise()` has grouped output by 'reh', 'wdays'. You can override using the
## `.groups` argument.
deer_short %>%
group_by(reh, wdays) %>% summarise(mean_steplength = mean(steplength)) %>%
ggplot(data = ., aes(x=wdays, y=mean_steplength, color = reh)) + geom_line() + geom_point()
## `summarise()` has grouped output by 'reh'. You can override using the `.groups`
## argument.
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
Statistical Variant
deer %>%
ggplot(.) + geom_boxplot(aes(x=wdays, y = log(steplength)))
## Warning: Removed 30 rows containing non-finite values (stat_boxplot).
deer %>% drop_na(steplength) %>% filter(between(steplength, 1, 3000)) %>%
aov(log(steplength) ~ wdays, data = .) -> aov1
plot(aov1)
summary(aov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## wdays 6 1797 299.54 181 <2e-16 ***
## Residuals 80668 133507 1.66
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tk_hsd <- TukeyHSD(aov1)
library(multcompView)
names <- names(tk_hsd)
p_vals <- extract_p(tk_hsd)
multcompLetters(unlist(p_vals))
## wdays.Mon wdays.Tue wdays.Wed wdays.Thu wdays.Fri wdays.Sat Sun
## "abcdefg" "acdef" "abef" "aefh" "acf" "a" "hi"
## Mon Tue Wed Thu Fri
## "bgi" "cdgi" "bdeghi" "bdeghi" "bcdefghi"
pdf()
ggplot() +
geom_sf(data = boden, aes(fill = OBJEKTART)) + scale_fill_brewer(palette="Dark2") +
geom_sf(data = strassen %>% filter(!is.na(WANDERWEGE)), color = "red") +
geom_sf(data = deer, aes(color=reh))
dev.off()
## quartz_off_screen
## 2
Now adding the strava data
terra::extract(strava, vect(wanderwege), fun=mean) %>%
cbind(wanderwege, .) %>%
st_as_sf(.) %>% rename(popularity = X01_combined) -> wanderwege
wanderwege %>% st_buffer(5) -> wanderwege5
wanderwege %>% st_buffer(10) -> wanderwege10
wanderwege %>% st_buffer(30) -> wanderwege30
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(mcp) + tm_polygons("reh", alpha = 0.4) +
tm_shape(wanderwege30) + tm_polygons(col = "popularity")